Features extraction and temporal segmentation
نویسندگان
چکیده
This paper deals with temporal segmentation of acoustic signals and features extraction. Segmentation and features extraction are aimed at being a rst step for sound signal representation, coding, transformation, indexation and multimedia. Three interdependent schemes of segmentation are deened, which correspond to diierent levels of signal attributes. A complete segmentation and features extraction system is shown. Applications and results on various examples are presented. The paper is divided into four sections: a deenition of the segmentation schemes; and description of the techniques used for each of them. 1 { The three segmentation schemes The rst scheme, named source scheme, concerns mainly the distinction between speech, singing voice and instrumental parts on movie sound tracks and on radio broadcasts. Other sounds, such as street sounds and machine noise, are also considered. The obtained data are propagated towards the following two schemes. The second scheme is refering to the vibrato. It is necessary to suppress vibrato from the source signal (singing voice and instrumental part only) to perform the following scheme. This last scheme leads to segmentation into notes or into phones, or more generally into stable or transient sounds according to the nature of the sound: instrumental part or singing voice excerpt or speech (when speech is identiied on scheme 1, then there is no need to try to detect vibrato). The data propagation between schemes (1 ! 2 ! 3 or 1 ! 3) shows the dependencies between the results. Other segmentation characteristics (some are described in the features below) such as short silences/sound (energy transients), transitory/steady, voiced/unvoiced (voicing coeecient), harmonic (inharmonicity coeecient) and so forth will follow. Most of these characteristics are needed for getting notes and phones segmentation results on scheme 2. 2 { source segmentation scheme This work is focusing on the identiication of two sound classes. The two classes are: music (that is to say singing voice and/or instrumental part) and speech. Accordingly, features have been examined: they intend to measure distinct properties of speech and music. They are combined into several multidimensional classiication frameworks. The preliminary results on real data are given here. Six features have been examined and retained Scheirer and Slaney 97]. They are based on: The spectral \\ux", which is deened as the 2-norm of the diierence between the magnitude of the Short Time Fourier Transform (STFT) spectrum evaluated at two successive sound frames. Notice that each evaluation of the STFT is …
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملObject-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images
As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...
متن کاملSpatial to temporal conversion of images using a pulse-coupled neural network
A new electronic model of Pulse-Coupled Neural Network is proposed. This model exhibits very interesting features such as: segmentation, feature extraction, orientation independence and noise tolerance. Segmentation means that the output pattern depends strongly on the spatial location of the pixels in respect to one other. Feature extraction means that if the input image includes several patte...
متن کاملAutomatic extraction of moving objects for head-shoulder video sequence
Recently, video object extraction has received great attention because it is a critical technique in object-based video processing. This paper presents a temporal-to-spatial segmentation technique to extract object from a video sequence. The temporal phase employs a simple blockwise temporal-activity measure to approximately locate the object boundary. And then a block-based maximum a posterior...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998